Mean Field Theory of Dynamical Systems Driven by External Signals 
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Dynamical systems driven by strong external signals are ubiquituous in nature and engineering. 
Here we study "echo state networks", networks of a large number of randomly connected nodes, 
which represent a simple model of a neural network, and have important applications in machine 
learning. We develop a mean field theory of echo state networks. The dynamics of the network is 
captured by the evolution law, similar to a logistic map, for a single collective variable. When the 
network is driven by many independent external signals, this collective variable reaches a steady 
state. But when the network is driven by a single external signal, the collective variable is nonsta- 
tionnary but can be characterised by its time averaged distribution. The predictions of the mean 
field theory, including the value of the largest Lyaponuov exponent, are compared with the numerical 
integration of the equations of motion. 
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Introduction. Our understanding of non linear dynam- 
ical systems and networks has made tremendous progress 
during the past decades. In most cases the autonomous 
dynamics is studied. The situation where the network 
is strongly driven by an external signal has so far been 
less investigated even though it arises in many different 
contexts in the natural and artificial world. Examples in- 
clude networks of interacting chemicals (proteins, RNA) 
in a cell driven by unpredictable external chemical sig- 
nals; networks of neurons driven by an external sensory 
input; artificial neural networks and their applications in 
machine learning; the response of population dynamics 
and ecological networks to changes in external conditions 
such as the weather; the responses of stock prices to eco- 
nomically significant news such as a company earnings, 
or unemployment numbers. In all these cases taking into 
account the external input is essential if one wants to 
understand correctly the dynamics, both because the ex- 
ternal input is often large (it cannot be treated as a small 
perturbation), and because in some cases the systems it- 
self has been selected according to its response to the 
fluctuating and unpredictable external variables. 

The aim of the present work is to show, through the 
study of a specific but important example, how mean 
field techniques can provide a detailed understanding of 
dynamical networks strongly driven by an external sig- 
nal. In the mean field approach the average feedback of 
the variables on themselves is taken into account through 
a self consistent equation, while the correlations between 
individual variables are neglected. The apparently ex- 
tremely complicated dynamics of the network is thus re- 
duced to much simpler evolution equations for a few col- 
lective variables. Previous applications of the mean field 
approach to dynamical systems (but without including 
an external input), and in particular neural networks, 
include e.g. For previous studies of dynamical sys- 



tems in the presence of external signals (with however 
a quite different emphasiz than in the present work) see 
e.g. @,|. 

The specific system we will consider is taken from the 
field of artificial neural networks. It consists of a network 
of randomly connected idealised neurons evolving in dis- 
crete time, and driven by an external time dependent 
signal, known in the machine learning community as an 
"echo state network" 0, @|, see also the continous time 
analog with no input studied in [4|- Such systems, when 
supplemented by a single linear output layer, fall within 
the class of "reservoir computers" [7H10j and currently 
hold records for several highly non trivial machine learn- 
ing tasks such as time series prediction or some speech 
recognition benchmarks, see e.g. [11] for a review. Be- 
cause of their importance in the machine learning com- 
munity, it is highly desirable to better understand the 
dynamics of these systems. In addition they can serve 
as toy models for investigating the dynamics of neural 
networks, and more generally any recurrent dynamical 
systems, driven by external inputs. 

An echo state network consists of a large number N 
of artificial neurons evolving in discrete time t £ Z. We 
denote by a,i(t) the "activation potential" of neuron i at 
time t. At time i + neuron i sends a signal to the other 
neurons with strength Xi(t + 1) given by 



Xi(t+1) = f( ai (t)) 



(1) 



where the function / is taken to be a sigmoidal func- 
tions, i.e. / is odd, monotonously increasing, has fi- 
nite limit for large a, and its first derivative f'(a) de- 
creases monotonously for positive a. By rescaling x and 
a we can redefine /(a) — > af((3a). We choose the scales 
such that /'(0) = 1 and lim a ^oo f(a) = 1. In the illus- 
trative figures, we choose for / the hyperbolic tangent 
/ (a) = tanh (a). 
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The update rule for the activation potentials is 
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UiS(t) , 



(2) 



where Wij is a time independent coupling matrix which 
gives the strength of the coupling of neuron j to neuron 
i, s(t) is the time dependent external input, and Ui is a 
time independent vector which determines the strength 
with which the input is coupled to neuron i. 

In echo state networks, the Wij and Uj are chosen in- 
dependently at random, except for global scaling factors 
Wij — > awij, Ui — > f3ui. By adjusting these scaling factors 
and by using an optimised linear readout it is possible 
to obtain excellent performance on a variety of machine 
learning tasks. The general heuristic is that the factor a 
should be adjusted for the system to be at the threshold 
of chaos, whereupon the response of the neural network 
to the input is highly complex, but deterministic. Previ- 
ous studies of the dynamics of echo state networks have 
aimed to understand how different dynamical regimes are 
related to changes in their information processing capa- 
bility Most closely connected to the present work 
is where, based on earlier work on feedforward net- 
works , the information processing capability of echo 
state networks could be studied in the limit where the 
number N of neurons tends to infinity. 

Mean Field approximation. The key insight behind 
the present work is to make the assumption that, at each 
time t, the Xi(t) behave as independent identically dis- 
tributed random variables which are also independent of 
the Wij and the Ui. Then the term X^=i w ij x j{t) m ec f 
([2]) is a sum of many identically distributed independent 
variables, and the law of large numbers tells us that this 
sum is distributed ("IS R Gaussian, see fig. [T]). 

With this assumption we can compute the distribution 
of cii(t), and then using eq. (P) the distribut ion of Xi (t + 
1). This analysis will yield a very simple one dimensional 
recurrence, similar to the logistic map, which captures 
the essence of the dynamics of the echo state network. 

In more detail, we reason as follows. Because the func- 
tion / is odd, the distribution from which are drawn the 
Xi(t) has mean zero (xi(t)) — 0, where () denotes ensem- 
ble average, i.e. average over the index i at fixed time t. 
We denote the variance of the Xi(t) by (xf(t)) = cr 2 (t). 
Assuming that the w^ are drawn independently at ran- 
dom from a distribution with mean zero E [wij] = and 
variance 
9 as 



E [i 



w , and introducing the rescaled gain 



g 2 = Nw 2 

we find that the term Ylf=i w ij x j{t) ~ ^(0; 5 2(j2 (0) h as 
gaussian distribution. We now assume for simplicity that 
the Ui are drawn independently at random from a Gaus- 
sian distribution with zero mean. By rescaling s(t) we 
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— Mean field theory prediction 
i I Simulation 




-2 2 

a, = Activation potential 



Figure 1: Distribution of the activation potential a^. A reser- 
voir with size N = 1000 and normalized gain of g = 2 and no 
source was run for 200 time steps, then the histogram of the 
activation potential was ploted in green. For comparaison a 
guassian with the variance predicted by the mean field theory 
was ploted in blue. 



can, without loss of generality, assume that the gaus- 
sian has unit variance. Then the activation potential 
a«(*) = Ef^WijXjit) + Ui s(t) - N(0,g 2 a 2 (t) + s 2 (t)) 
also has a gaussian distribution. (If the Ui are drawn 
from a distribution other than Gaussian, then the distri- 
bution of the ai(t) can in principle be calculated, but the 
expressions will be more complicated) . For future use we 
denote the variance of the ai(t) as £ 2 (£) = g 2 a 2 (t) + s(t). 
Finally, the distribution of Xi at time t + 1 is given by 
Xi(t + 1) - f(N(0,T, 2 (t))). We thus obtain a closed 
one dimensional recurrence for the variances of Xi(t) and 
a l (t): 



Y,\t) = g 2 a 2 (t) + s 2 (t) 
a 2 (t + l) = F (£ 2 (£)) 



(3) 



where 



F(S 2 ) = / daf 2 (a) 



exp 



2S 2 



dx f(^x) 



exp 



'2tt 



(4) 



From the properties of /, it folllows that F (E 2 ) has the 
following properties: F (0) = 0, F (+oo) = 1, dF/dT? > 
0, dF/dY?{Y? = 0) = 1. 

We first consider the case when there is no source s 2 = 
0. When g < 1, there is a single stationnary solution 
fj 2 = E 2 = 0, corresponding to a quiescent system, g = 1 
corresponds to a branching point. When g > 1, the stable 
stationnary solution is different from zero: a 2 > 0, E 2 > 
0. In the limit of infinite gain g — > oo, we have <r 2 — > 1. 
E 2 ^ ff 2 . 
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E 2 = Variance of activation potential g-Gain 



Figure 2: Histogram of the variance E 2 of the activation po- 
tential in the presence of source. A reservoir of size N = 500 
with normalized gain g — 2 and a single i.i.d normal source 
with variance £ 2 = 1 was run for 200 time steps to remove the 
influence of the initial state. Then the reservoir was run for 
2,000 time steps, the variance of the activation potential E 2 (i) 
was collected, and its histogram ploted in red. The same pro- 
cedure was done following the mean field theory predicion of 
eq. ([2]) and ploted in green. 



When the source s(t) is non-zero, integrating the re- 
currence eq. ^ yields a distribution of values for a 2 and 
£ 2 . In the Figures, we take for illustrative purposes the 
s(t) to be independently drawn at each time t from the 
same probability distribution, that is we assume there are 
no temporal correlations between successive values of the 
source. For definitness we take the s(t) ~ 7V(0,<!; 2 ) to be 
distributed according to a Gaussian with zero mean and 
variance £ 2 . Comparison of the mean field theory and 
the exact distribution for u 2 (t) obtained by integrating 
the equations of motion is given in fig. [2J 

It is instructive to compare the above situation with 
the case where the echo state network is driven by many 
independent sources. In this case one replaces eq. © 
by a,(t) = ^2j—iWijXj(t) + Vi(t) with u<(t) taken to 
be independently drawn from the Gaussian distribution 
Vi(t) ~ N(0,v 2 ). In this case the recurrence © becomes 



a 2 (t + l) = F(T?(t)) 



(5) 
(G) 



This recurrence admits a stationnary solution. When 
v 2 > 0, the stationnary solution is always different from 
zero: a 2 > 0, T? > 0. For very small gain g — > we have 
S 2 = v 2 , a 2 = F(v 2 ). In fig. [3] we show that the variance 
of an echo state network driven by a single source is close 
to the stationary variance computed according to eqs. ([5j 
|6|): the time average of the variance is identical to the 
stationary solution, and the standard deviation is small. 

Note that in the limit of infinite gain g — > oo, we have 
a 2 —> 1, £ 2 — > g 2 . independently of the source. 



Figure 3: Variance of the activation potential E as a func- 
tion of normalised gain g. For each activation potential, a 
reservoir of size N = 500 with a single i.i.d. normal source 
with variance £ 2 = 0.2 was run according to eqs. |T] [5]) for 
200 time steps to eliminate transitory effects, then the vari- 
ance E 2 of the activation potential was recorded for 500 time 
steps. For each gain, the mean and standard deviation of E 2 
was computed. The mean is ploted in red, and mean ± one 
standard deviation are ploted in doted red. The green dashed 
curve (superimposed on the red curve in the figure) corre- 
sponds to the predictions of the mean field theory computed 
with N independent sources according to eqs. ([5] [6]). 



In the figures we took /(a) = tanh(a) whereupon the 
integral eq. Q yielding the sationnary solution F had 
to be carried out numerically. This case is of interest as 
it is the one most often used in previous works. It is 
however interesting to note that the function F(E 2 ) can 
be computed analytically in two cases. The first does not 
correspond to a sigmoid function /, but it is of interest as 
it has been used in a recent experimental implementation 
of reservoir computing [l5|, 0|: if /(a) = v / 2sin(-^), 

then F(E 2 ) = 1— cxp (S 2 ). The second case (obtained by 
first evaluating dF J^ \ see [IH) corresponds to a sigmoid 
function /: if /(a) = erf(^a), then F{T, 2 ) = -1 + 
£ arctan (Vl + 2E 2 ) . 

Stability. The mean field theory also allows a deriva- 
tion of the Lyapunov exponents of the echo state network. 
Consider two neighbouring solutions ai(t) and a' i (t) j with 

joint Gaussian distribution: a,(t) + a ' (t) - iV(0,£ 2 (i)), 
Oi(t) - a'^t) - N(0, 5 2 {t)). We take S 2 (t) small and wish 



to compute how it evolves with time. We have a,i(t + 
1) - a'S + 1) = E j wvf ( a - (t) r' (t) ) «*) ^ + 
O ((a — a') 2 ) fr° m which we derive that 



5 2 {t + l) = 6 2 (t)A(Z 2 (t)) + 0(S 3 {t)) 

exp 



A(S 2 ) = g 2 da f' 2 (a) 



2E- 



V2ttE 2 



(7) 
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t = Number of steps used to compute A g = Gain 



Figure 4: Convergence of Lyapunov exponent estimate. A 
single reservoir of size N = 500, normalized gain g — 2, and 
source variance £ 2 = 0.2 was run for 200 steps. Then the 
states were perturbed with i.i.d. gaussians with standard de- 
viation 10 -10 and the size of the pertubation was recored for 
100 time steps. The lyapunov exponent A was computed for 
the different lags: {/6 2 {t) /5 2 {0). The green line is the result 
of a simulation done according to eqs. fl] [2j, the doted red 
line is the result of a simulation done in the annealed approx- 
imation (see conclusion), the straight blue line is the mean 
field theory prediction computed from lff|). 



See fig. H] for a comparison between Lyapunov exponents 
computed from the exact equations of motion and from 
the mean field theory. 

From this expression two interesting properties imme- 
diately follow. First, in the absence of source the quies- 
cent stationnary solution a 2 (t) = £ 2 (i) = has largest 
Lyapunov exponents smaller than 1 for g < 1, and largest 
Lyapunov exponents larger than 1 for g > 1. (This fol- 
lows from the fact that the integral in eq. (J7J equals 1 
in the limit S 2 — > 0, since /'(0) = 1). Thus, in the ab- 
sence of source, g = 1 is the threshold for chaos in the 
dynamical system. 

Second, for sigmoidal functions /, we find the property 
(well known to those who use echo state networks for 
machine learning tasks) that increasing the strength of 
source term £ 2 stablizes the system. Indeed when £ 2 
increase, S 2 also increases. For sigmoidal functions /'(o) 
is a decreasing function of \a\, and hence the integral 
in eq. decreases when E 2 increases. See fig. [5] for 
illustrations of this prediction. 

Conclusion. In the present work we have shown that 
mean field theory can be applied to large random net- 
works driven by external signals. The agreement, at least 
in the case of echo state networks, with the exact inte- 
gration of the equations of motion is remarkably good. 

The mean field theory itself is closely related to the an- 
nealed approximation wherein the Wij (t) and the itj (t) are 
redrawn independently at random at each time t from the 
same distribution, i.e. the coupling coefficients become 



Figure b: A as a function of source volatility 
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Figure 5: Lyapunov exponent as a function of gain (panel 
a) and Lyapunov exponent as a function of source volatility 
(panel b). An N = 500, g — 2 reservoir was run 200 time 
steps, than perturbed by i.i.d. gaussians with standard devi- 
ation 10 -12 and run for 20 time steps. The lyapunov exponent 
was computed based on how much the pertubation changed, 
and ploted in red. The mean field theory prediction is in blue. 
In panel a the source variance is £ 2 = 0.2; in panel b the gain 
is g = 2. 



time dependent. We expect the mean field theory to be 
exact in the large N limit of the annealed approximation. 

Compared with the case when there is no external sig- 
nal, we recover a number of features which are well known 
empirically to people working with echo state networks, 
but which have not been derived analytically before. In 
particular we find that if in the absence of external sig- 
nal the system has a trivial stable state (corresponding 
in our analysis to the case g < 1), then in the presence 
of external signal the dynamics becomes non trivial. We 
also find that the presence of the external signal tends 
to stabilize the system (i.e. Lyapunov exponents which 
decrease when the external signal increases) . 

We also compare the cases where the dynamical system 
is driven by a many independent input signals, and by 
a single input signal. We find a qualitative difference. 
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Namely in the first case the mean field theory predicts 
that the collective variables take on stationnary values, 
whereas in the second case the mean field theory predicts 
their statistical distribution. 

The present work should provide the basis for further 
in depth understanding of the highly important case in 
practice of dynamical systems driven by time dependent 
external signals. 
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